Operations

Overview

The SDSS progresses by acquiring imaging data during pristine weather, reducing those data (Chapter 14) and putting them into the operational and science data bases, selecting spectroscopic targets, tiling the sky (Chapter 12), drilling the fiber plug plates, acquiring spectroscopic observations, reducing the spectroscopic observations and entering the spectroscopic data into the operational and science data bases. The progress of the observations is subject to weather, the phase of the Moon, and so on, but a basic requirement is that it should be possible to acquire spectroscopic data for a given field one month after the photometric data are obtained. There are several reasons for this (discussed in more detail in Chapters 5 and 14), including (1) difficult areas of the sky should be observed in an efficient manner, so that they do not force the operation of the survey into a long-drawn-out end game: (2) the scientific goals of the survey will be better served if the spectroscopy more or less keeps pace with the photometry: and (3) guide stars chosen from the photometric data may have large proper motions.

A major consideration for operations is efficiency and cost-effectiveness. To do the survey in the minimum elapsed time means that the imaging must keep enough ahead of the spectroscopy that tiling is effective (see the discussion in Chapter 12). Likewise, the Monitor Telescope must keep well ahead of the imaging survey in its establishment of secondary calibration patches around the sky (Chapter 9). Further, it is cost-effective to have the plug plates drilled in large batches, probably two batches of twenty each per month, and not to have to re-drill too many plates. For all of these reasons, a great deal of planning, involving software tools constructed for the purpose (see Section 14.3.10), is underway for survey operations. This chapter summarizes these plans. Further details are given in internal documents by Petravick et al. (1992) and Kron et al. (1996).

The major centers for the survey operation are APO (where the data are acquired) and Fermilab (where they are reduced and archived). This chapter therefore describes the operations and staffing plans for both APO and Fermilab. In addition to the work at these two sites, the operations plan includes the plate drilling and support and further development by the institutions which have supplied major hardware and software components to the SDSS.

The next section describes the overall operation at APO, and is followed by more detailed descriptions of photometric and spectroscopic observing. The last section describes the operations and staffing at Fermilab.

APO Operations: The Nightly Observing Plan

The Survey Strategy system provides a monthly plan for what part of the sky is most urgent to observe at that time, given the overall plan for the survey, both for imaging and for spectroscopy. Although the monthly plan is devised at Fermilab, the detailed decisions on a night-to-night basis must be made at APO according to weather and other local conditions.

This level of decision-making responsibility means that the observers at APO must have extensive experience in astronomical observations. Indeed, they should have extensive experience in running the SDSS itself. Hence, we will seek not only highly skilled, versatile individuals who have a good understanding of the scientific requirements of the survey, but individuals who will find the job attractive for the long term. We are confident that such people exist; every observatory has seen examples, and we are hopeful that we will be able to recruit a number of them. At the present time (December 1996) hiring is underway for these positions.

During a night of photometry (on average about 2 whole nights per month), long scans are much more efficient than many short ones. Thus, if all is going well during a long scan, the observers need only check the status of the instruments and the atmosphere to determine whether the scan continues to be worthwhile. The systems to be checked include the 10 µm all-sky camera; the data stream from the Monitor Telescope and the current extinction coefficients; the hardware status of the camera itself; and various measures of the online data from the camera such as the flat-field vectors, the night sky brightness, the point-spread function derived from bright stars, the focus log, and various engineering health data. All of these activities take place within the 2.5-m control room in the Operations Building at APO.

For a night of spectroscopy, operations begin the day before with the preparation of the fiber cartridges (cf. the discussion in Section 11.6.0.1). Up to ten cartridges can be plugged, and the fields are chosen according to a plan that attempts to schedule observations near the meridian. Hence, ideally, the targeted fields are spaced about one hour of right ascension apart, and in any case are arranged in a particular temporal sequence. The daytime work includes the determination of which fiber was plugged into which hole using the fiber mapper, and other bookkeeping as necessary. The cartridges are stored in a space with doors to both the plugging room and the outside. At night, the outside door is opened to allow the cartridges to equilibrate to the temperature of the ambient air. During the night, they are accessible from the outside of this building for convenience in transportation via a cart to the 2.5-m telescope. The handling system is designed so that one person can dismount the previous cartridge and mount the next one in about 5 minutes. A detailed time breakdown for the operations involved in acquiring and calibrating spectra is presented in Section 11.8.

Besides swapping the fiber cartridges, the activities during a night of spectroscopy include monitoring the quality of the exposures. The exposure will be subdivided into three sub-exposures. Doing this helps eliminate cosmic rays and allows a direct evaluation of the throughput before the end of the last exposure, the length of which can be adjusted to keep the quality of the spectra uniform from field to field. A trimmed-down version of the spectroscopic pipeline will also run on samples of the data to check on their quality.

Clearly, a night of spectroscopic work will be quite hectic even when clouds are not making things more difficult, and we project that two people will be required to run the spectroscopic operations. Two people are also required for running the imaging survey; one to work on data quality checks and the the other to monitor the status of the hardware. We therefore plan on a staff of two people for nighttime operations. These observers must also be able to change between photometric and spectroscopic observations during a night, and to make use of partial nights (before the Moon comes up, for example) on a routine basis.

Of course, one can only guess at the evening weather patterns from weather reports and conditions in the afternoon. Therefore, even on afternoons for which we anticipate doing photometry, we will have a set of spectroscopic cartridges prepared in case the weather turns out not to be absolutely pristine.

There will routinely be instances where a question or problem arises during the night that cannot be adequately addressed at the time. A log is kept of these issues to be resolved by others the following day. Also, the night crew must leave instructions for the disposition of the fiber cartridges, according to their best judgment of the success of each exposure, and, accordingly, which new fields must be prepared for the next night of spectroscopy. The daytime "problem solver" must have a skill mix similar to that of the observers, and one model is that there are a total of four observers who rotate on a schedule such that at any given time, two are on duty at night and one in the day. The daytime "observer" may also have the responsibilities of supervising the fiber handling staff, supervising the disposition of the media written the previous night, and so on. APO has many individuals working on the ARC 3.5-m and other telescopes, and it is likely that time-sharing of technical support will be undertaken to take advantage of economies of scale.

A staff of 12.5 FTEs will be resident at Apache Point Observatory to operate the survey once routine operating status has been achieved. These individuals are 0.5 of the Site Manager (who also manages staff for the 3.5 meter ARC telescope), 2.5 engineers, 5 observers, 3 technicians (including the fiber pluggers), and 1.5 clerical and financial support FTEs. An additional staff of 4 to 5 engineers will be available to the site manager but resident at their home institutions, to provide expert assistance on an as-needed, first-call basis. Apart from staff issues (about 60% of the total budget), the other site operations costs are based on the known costs of operating the ARC 3.5 meter telescope at this site (telephone, utilities, liquid nitrogen delivery, insurance, housing etc.) and are detailed in Chapter 17.

One of the observers will be designated as the APO operations scientist and will be responsible for all science decisions and for verifying that all aspects of the operations are in order. If equipment problems are found, the Site Manager will provide the required support, guiding the support staff as required.

Photometric Operations

Summary

This section is based on a detailed operations plan for the imaging camera by Bartlett et al. (1995). This discussion includes the description of a night's photometric observations, including diagnostics to be sure the observations are going well. The diagnostic tools are mostly contained within IOP, the Imaging Observer's Program.

Under weather conditions in which it is expected that photometric observations can begin after sunset or moonset, the camera should be mounted on the telescope well in advance to allow the startup operations to be done before the sky gets dark enough for observing. The exception to this, of course, is a night which starts out "spectroscopic", but then conditions improve enough to do photometry. Inevitably, about half an hour of observing time will be lost to mounting the camera and startup in this case.

Bias data to test the instrument should be taken before and after mounting. The camera is then focused and bias data for the data reduction pipelines taken. Tracking data should be taken for at least 8.3 minutes - the transit time from the top of the leading astrometric array to the bottom of the trailing astrometric array, to check camera rotation and telescope tracking.

At this point, normal photometric observing can begin and no more direct action is required. The observers should monitor the observing conditions, the camera diagnostics and the telescope diagnostics to be sure that all is going well and to decide when photometric data taking is to be terminated. At the end of the photometric session, another set of bias data is taken, the camera is shuttered, removed from the telescope and stowed. The cryogens are refilled and checked to be sure that the camera will remain cold. Bias data are also taken after the camera is dismounted and stowed, to determine that it is still in working condition. The data tapes are copied and shipped to Fermilab.

It is anticipated that photometric observations will be conducted whenever conditions permit. The criteria are:

Quality Checks during Observing

Among the diagnostic and monitoring tools available to the observers are:

  1. Data on the atmospheric cloud cover and transparency from the 10 µm camera (Chapter 7) and from the Monitor Telescope (Chapter 9).
  2. Continuous data on the sky brightness from the photometric data acquisition (DA) system.
  3. Continuous measures of the PSF from the imaging camera DA system.
  4. Continuous measurement of the focus and tracking from the focus CCD and the astrometric arrays.
  5. Monitors measuring the voltages and temperatures within the camera.
  6. Scrolling displays showing the output of a selected subset of the photometric CCDs.

Observing

The observing sequence on a photometric night goes something like this:

  1. Obtain a global plan from Survey Strategy which specifies the areas of the sky to be observed. This plan must have built-in flexibility because of the vagaries of weather.
  2. Load the tape drives, write the headers, and verify that enough tapes are available to contain a night's worth of data.
  3. Fill the liquid nitrogen reservoir and verify that there is enough cryogen to complete the night's program.
  4. Purge files from the previous night's observing from the DA system.
  5. Run diagnostic checks, and take calibration and bias data.
  6. Take dome flats (as a diagnostic, and to allow flattening of the data from the astrometric chips).
  7. Withdraw the telescope enclosure and run diagnostic checks on the telescope.
  8. Do an initial focus.
  9. Send the list of the night's calibration patches and the optimum times to observe them to the Monitor Telescope.
  10. Begin the scan.
  11. Monitor the weather conditions, the camera diagnostics, and the sky quartiles and PSFs from the data system.
  12. End the scan, close the shutters and dismount and stow the camera.
  13. Prepare and ship the Fermilab tape copies and update the tape log.

On-Site Maintenance

When the construction and testing of the instrument in Princeton is finished, roughly in March 1997, the camera will be partly disassembled and shipped to APO, where it will be reassembled in a clean room set up for the purpose. There, the camera will be extensively tested and shaken down using the DA system and the sensitivity of every CCD measured through the corrector plate and filter using a 50 Å resolution monochrometer constructed by the JPG. This set-up will remain in place throughout the duration of the SDSS. The camera sensitivities will be re-measured on a regular basis, and the clean room is available for testing, storage and, should it prove necessary, dis-assembly and repair. A complete set of spare parts is maintained at APO. The camera construction includes the equivalent of a back-up dewar containing 5 CCDs, which can replace any of the six operational dewars if necessary.

Monitor Telescope Operations

This section is taken from a detailed document by McKay (1995). The Monitor Telescope has two functions: 1. to observe stars over a wide range of airmass during both spectroscopic and photometric observations to determine the atmospheric extinction as a function of time and color, and 2. to set up a network of patches of secondary standard stars for the calibration of the photometric data. This will be done on all photometric nights. Ideally, the secondary patches will be calibrated before, or, at the least, on the same night as they are observed by the photometric camera. The MT is run by MOP (the Monitor Telescope Observer's Program), which also contains diagnostic tools. The Monitor Telescope pipeline, MTpipe, will likely be run at APO as well as at Fermilab to provide real-time measurements of the extinction and photometric integrity.

Each night, the MT observing plan contains a list of targets with priorities. The calibration patches to be scanned by the camera have the highest priority.

MT operations begin with a daycheck sequence, which checks that the telescope and the dome are fully operational and includes filling the camera dewar with liquid nitrogen. Next, the calibration data, consisting of observations of a flat-field screen and the measurement of bias and dark frames, are taken. The telescope dome louvers are now opened to allow the telescope to cool down in the early evening.

The MT will observe six primary standard stars and three secondary transfer patches, in each of the five survey bands, per hour. The latter are provided by the imaging observing plan as described above. When there are no pending secondary fields the MT will observe primary standards. The network of standard stars is described by Fukugita et al. (1996). MTpipe will calculate extinction in real time and provide this information to the 2.5 meter observers. At the end of the night, the data are copied to tape for shipment with the 2.5 meter data to Fermilab.

Spectroscopic Operations

Introduction

This section, based on a detailed document by Uomoto (1995), describes the practical aspects of obtaining SDSS spectroscopic data, from target selection to data delivery. Most activities produce a computer file, so these are examined closely.

Target Selection

Plugplate Target File

Spectroscopic targets are selected according to criteria set by the science working groups. A tiling run optimizes the plugplate center coordinates. The plugplates are then populated with targets. The list of objects on a particular plugplate is the plugplate target file.

This file is called out and saved because it is the last point where targets can be selected, deleted, or modified before drilling. The file contains the following quantities. The field center is obvious. The tiling run is the unique job number that produced the tiling solution for this plate and its siblings. The exposure time is computed from the expected atmospheric extinction and reddening. The object records include a name, the ra and dec, the proper motion and epoch (in mas/yr, useful for guide stars with known proper motions and high proper motion targets), type is one of (galaxy, star, quasar, standard, sky, guidestar, or lighttrap) photom is the magnitude and color ( g' and g'-r' ), and flags is one or more of (duplicate, AGN, color, propermotion, and other quantities). A lighttrap is a plugplate hole at a bright star. This hole will not receive a fiber but will allow light to pass through the plugplate rather than being reflected and possibly contaminating faint object observations.

The duplicate flag is set when an object appears on more than one plugplate. This can be useful for quality analysis (i.e., to see if two spectra of a given object are consistent with one another). The AGN and color flags might be set if the object was selected because of its AGN properties (such as a stellar nucleus as opposed to a usual color-selected quasar), or if a star was selected because of its unusual color or proper motion. We can investigate the effectiveness of our selection criteria by looking at the flags.

Plugplate Drilling File

The plugplate target file is the input to drilling routines that create the plugplate drilling file. The drilling routines map ra and dec to x, y positions on the plugplate using information about the telescope optics (for geometric distortions), anticipated temperature at the observatory (to calculate the plate scale), and the declination and anticipated hour angle (to calculate the atmospheric refraction correction).

The plugplate drilling file contains all the information in the plugplate target file plus additional information for drilling, mostly in the header. We carry through the accumulated information even if it is not immediately useful. For example, the astronomical coordinates are not important at this stage but they appear here. The intent is to simplify the development and maintenance of programs using the data, so that one input stream and output data stream can be used.

The drilling algorithm uses the predicted temperature at which the plugplate will be drilled. The difference between this and the telescope temperature should be good enough to calculate the correct scaling at the telescope; the scale can be adjusted over a narrow range by manipulating the optics.

The plugplate drilling file is transmitted via FTP to the observer's workstation at Apache Point along with a checksum. The observer's workstation at Apache Point automatically verifies correct receipt or sends a checksum error message back to Fermilab. The name of the file is of the form 001234.drl where the 001234 is the plugplate identifier noted in the header. This makes it easy to find the file for a particular plugplate. The checksum information guarantees that the files have not been modified accidentally after receipt at Apache Point. Inadvertent changes by humans are easily caught this way.

CNC Drilling File

The plugplate drilling file is the source for the machinist's drilling program, the CNC drilling file. This file is generated directly from the file residing on the observer's workstation at Apache Point and delivered to the machinist. It is a one-to-one translation of the plugplate drilling file, a format conversion only. No coordinate transforms are done by this program. The expected delivery date of the CNC drilling file is sent to Survey Strategy where tracking and followup occur.

Drilling

The plugplates are drilled and the plugplate identification number is manually stamped on the edge. The plugplates are blackened on the sky side and have a pushing post of radius 3.2 mm in the middle, preventing fiber placement in that region.

Plugplate Receipt

Daytime staff at Apache Point receive the plugplates and inspect them for damage. Receipt and condition information is transmitted to Survey Strategy, which takes any necessary action (replaces bad plugplates, for example).

Plugging and Unplugging

Marking plugplates

Holes on the plugplate are used for a variety of purposes; not all get spectroscopic fibers. Some are for guide stars and other smaller holes trap light from bright stars. There is also a large (0.25-inch) hole for a coherent sky bundle and perhaps 10 holes for drilling quality control. These must be marked before plugging. A viewgraph transparency generated from the plugplate drilling file is projected onto the plugplates so that the guide star, light trap, and quality control holes can be manually marked. The software for printing the transparency is on the fiber mapper because this is the system used by the daytime plugging staff. The plugplate drilling file is accessed from the observer's workstation via NFS.

Plugplate Selection, Plugging and Mapping

Survey Strategy sends to Apache Point the list of plugplates that should be ready for the next spectroscopic night.

Daytime staff at Apache Point plug the plates with spectroscopic fibers, guide star fibers, and light traps.

After plugging, the fiber cartridge is presented to the fiber mapper, which matches the object name to the spectrum location on the spectrograph CCD. The input is the plugplate drilling file and the output is the mapping file. The operator installs the fiber cartridge in the fiber mapper and types the plugplate identification number. The mapper goes to the observer's workstation and opens the corresponding plugplate drilling file. It performs (or instigates on the workstation) the checksum verification to make sure the file hasn't been damaged or modified since it left Fermilab.

The mapper then scans the fiber cartridge to determine which fiber corresponds to which object in the plugplate drilling file. It writes the mapping file, which for the most part looks like the plugplate drilling file. Appended to each non-light trap object record are two numbers: the slithead identification and the spectrum number. Each slit-head is coded with a 7-bit identification which tracks track slithead modifications, dead fiber status, and repairs; each time the slithead is modified, the number is changed. On each cartridge, one slithead has an even identification, the other odd.

A 4-digit fiber cartridge identifier is made by combining the two slithead identification numbers in hexadecimal. Concatenating the two 2-digit hex numbers, even number last, makes the 4-digit fiber cartridge identifier. Thus, a fiber cartridge that holds slitheads number 5 and 10 would be called 050a. The mapping file, whose name would be 050a.map in this example, is written to the observer's workstation along with a checksum for automatic verification. When the mapping file is safely stored on the observer's workstation, a message is sent to Survey Strategy alerting them that the fiber cartridge is available for nighttime use.

Unplugging

When the spectroscopic observation has successfully been obtained, or the plugplate expires (because the field is no longer reachable), the daytime staff removes the plugplate from the fiber cartridge. Notification is sent to Survey Strategy.

Observing

Nightly Lineup

Before the beginning of every night, the spectroscopic lineup is written by the Apache Point observer. At the least, it shows the civil time range during which each cartridge can be used, given airmass constraints. At this time, existence of the related .map files should be confirmed.

Decide: Imaging or Spectroscopy

The Apache Point observer decides at sunset whether the night starts with imaging or spectroscopy.

Load a cartridge

The nighttime staff selects a cartridge and latches it to the telescope. If this is the first spectroscopic observation after an imaging run, the spectroscopic corrector must be installed. The corrector is carried above the plugplate on any of the ten fiber cartridges and is latched onto the telescope at the beginning of a spectroscopic run. It is removed when the imaging camera is installed.

Setup for Calibrations and Exposure

The observer's program reads the fiber cartridge identification number from the spectrographs and opens the associated mapping file. The mapping file contains the plate coordinates, valid hour angles, guide star locations, intended temperature, etc. It passes the intended temperature information to the telescope and instructs it to point, scale, and focus for the estimated mid hour angle of the exposure.

Simultaneously, the observer's program adjusts the spectrograph focus and alignment accounting for the temperature and the slithead characteristics. When the telescope is ready, the flat field screen is extended and a short calibration lamp exposure checks the image position and focus. The observer examines the placement of one or two sets of emission lines on each CCD to make sure the spectra are properly positioned. Adjustments are made if necessary by moving the spectrograph collimator mirror. If the observer feels the need, a Hartmann exposure is made to check the focus and the focus is adjusted. When all is properly set up, a wavelength and flat field exposure set is taken and recorded at this telescope position.

The observer's program then tells the telescope to retract the calibration screen and point to the field center. It also transmits the guide star coordinates to the telescope. The guider picks up the guide stars and the telescope is instructed to center up, adjust the scale and rotation, and start guiding.

If telescope pointing turns out not to be accurate enough to find the guide stars in the guide fibers, an offset from a star in the large coherent bundle can be done prior to setup.

Spectrophotometry Sequence

When the observer's program finds that the telescope is set up on the field and guiding, it begins a spectrophotometric exposure sequence. The observer's program moves the telescope in a 2x2 raster, each position offset from the center by half a fiber diameter first in altitude, then in azimuth, stopping at each location for a short exposure. These exposures are about 1 minute long and are heavily binned. These data are used to check the photometric zero points against the predicted count rate (are we even close to being centered up?) and act as a sanity check on the centering and drilling algorithms.

They will also be used to provide spectrophotometric calibrations. Comparison of the four exposures allows a detailed calculation of fiber losses; thus, normalizing the redshift exposures to photometric levels indicated by the spectrophotometry exposures will be a first-order correction for slit losses.

Redshift exposure

The observer's program then begins the first of three spectroscopic exposures. While exposing, the observer's program queries the telescope for the following guide star information: count rate for each guide star, FWHM (or other seeing indicator) for each guide star, sky brightness from the large coherent bundle, and telescope pointing error. The query rate will be something like every 10-30 seconds. At the least, this information is reported to the observer; at best, it will be used to adjust automatically the exposure time.

Decide again

At the end of each spectroscopic exposure a decision to stay with spectroscopy or change to imaging mode is made by the Apache Point observer. Sky condition information comes not only from visual inspections but from the IR camera (which is useful for seeing clouds on moonless nights) and the monitor telescope, which presents the transparency, extinction, and stability of the sky. The guide cameras on the spectrographs are useful for adjusting spectroscopic exposure times but are not sufficiently accurate to help in the imaging versus spectroscopy question.

Special Sequences

Spectrograph focus

The spectrograph is focused by comparing two discharge lamp exposures taken with Hartmann masks in place. We expect that an automatic sequence will be written to determine and set the best focus.

The spectrographs actually have three focus adjustments. The most important is the collimator to slithead distance, which is controlled by three motors (we can also aim the collimator). Each of the two spectrograph cameras also has a manual focus adjustment that moves the CCD with respect to the big lenses of the camera. During normal operations, only the collimator mirror adjustment is needed.

Dispersion curves

The low order coefficients of the dispersion curves will be measured directly from the discharge lamp calibration exposures. The higher order coefficients are computed from high signal to noise exposures obtained less frequently. Some set of long exposures of specific calibration discharge lamps will be obtained maybe once per night. These will be delivered to the spectroscopic data reduction system for analysis.

Bias exposures

A set of bias exposures, medianed in groups of three to eliminate radiation hits and then averaged to increase the signal-to-noise ratio, is taken at regular intervals. These may or may not be used in data reduction.

Dark exposures

Dark exposures of 15-60 minute duration will be obtained at occasional intervals as a diagnostic check. These will probably not be used in data reduction.

Dithered Flat Field

The flat fields taken just before each exposure (Section 15.5.4.4) will of course give a trace of the position of each spectrum on the CCD. However, the count-levels in this flat-field fall to unacceptably low levels between fibers, and therefore it is inappropriate for correcting the spectrograms for pixel-to-pixel variations. We therefore plan to occasionally take a dithered flat field, in which the collimating mirror is rocked back and forth to smear the light from each fiber over a number of pixels, thus giving uniform illumination on the chip.

Sky

These are obtained during periods not good enough for spectroscopy (too much moon, poor seeing, etc.) and are compared with flat field exposures to check the calibration system for throughput variations, baffling problems, etc.

Scattered Light

A single bright star is sent through the spectrograph to map the scattered light background. This should be done every month or so to check for dust on the spectrograph optics, lens hazing, and changes in internal baffling.

Blocking

A small bright planetary nebula (NGC 7027 comes to mind) should occasionally be put on the spectrograph to check the second order blocking; [O II] lambda 3727 should show up in second order but not in first. If we pick a known spot on the nebula, we can estimate the blocking efficiency.

Write Tape

At the end of the night, two DLT tapes are installed in the observer's workstation and a write tape routine is run. One tape is stored at Apache Point and the other is shipped to Fermilab the following Monday or Thursday by express carrier.

Instrument Log

The current plan is to provide a paper notebook that resides at the observatory for use by the spectrograph scientist and resident instrument specialist. This is the most convenient record keeping system for those working on the instrument. There might be a need for some of this information to be transmitted to others, however. For example, broken fibers must be accounted for in the object count. A summary of this information is also entered into an on-line instrument log.

Data Processing Operations at Fermilab

Introduction

The data processing must keep pace with the data collection. This implies sufficient CPU power, but also the necessary resources to validate the processing. These resources are directed at keeping track of the raw media, mounting the input and output media, maintaining the data reduction systems, submitting batch processing jobs, quality control of the data reduction, and making the processed data available to "first look" scientific analysis.

Data Processing Operations

In rough chronological order, the data processing operations include the following tasks:

Data Received from APO

- fetch observing log from APO

- enter processing jobs, pending delivery of tapes

- electronically send tape packing slip to Feynman Computing Center (FCC) at Fermilab

- tapes express shipped

- tapes received at FCC; contents checked and logged; jobs released

Examination of Daily Outputs

- examine processing log; set priorities for data processing

- examine Monitor Telescope results; define times with good conditions; load operational database (opdb)

- examine photometric output; define good data segments; load opdb

- examine spectrograph output; define successful plates; load opdb

- report crashes to software maintainers

Global Quality Assurance

- compare object lists to existing catalogs

- compare object properties from different scans of same strip

- compare spectra of objects observed more than once

- compare astrometric and photometric calibration in strip overlaps

- check target efficiency -- are we finding QSOs and galaxies at the expected rates?

Weekly Status Reports

This is a weekly meeting of all data processing people, covering:

- observing progress

- processing progress

- quality assurance report for each diagnostic

- trouble reports (routine processing; calibration; science analysis)

- planning for observing

- planning for re-processing

Preparation for Spectroscopic Observing

- decide when to tile an area

- check final calibrations

- inform science working groups

- select targets in the area

- do tiling; assign targets to plates; sanity checks

- select targets of opportunity

- design plates

- send plate specification to fabricator

Loading the Science Database

- images of objects loaded as areas are accepted for tiling

- corresponding spectra as they are certified

Production System Hardware

The hardware needed to support these activities is already in hand. The SDSS production system machines consist of two DEC 8200 servers, each with 1 GBy of RAM and five 300 MHz alpha processors. Each machine has four DLT 4000 high speed high capacity tape drives and 25 nine GBy elite-9 disk drives for a disk capacity of slightly more than 200 GBy per machine.

These two machines are the primary data processing machines. Up to 10 copies of the photometric pipeline will run in parallel during normal processing, allowing the processing to be completed for 8 hours of on-sky observing in no more than 16 hours. The processing can therefore keep up with the incoming data rate. Since we expect the imager to be used once every eight nights on average, there is adequate time both for reprocessing the photometric data if need be and for running the spectroscopic processing.

In addition, the project has access to more than 3 TBy (an expandable figure) of tertiary storage on an IBM CAPHSM hierarchical storage system (a tape robot) served by dual 9 MBy/s IBM 3590 tape drives. Corrected imaging frames and copies of atlas images will be off-loaded onto this system.

The database machine for the SDSS consists of a six 150 MHz processor Silicon Graphics Challenge L machine, with 512 MBy of RAM and 300 GBy of spinning disk. Both the operational database and the user-accessible Science Archive will be served from this SGI machine.

Operations Computing

It is convenient to consider four categories of computing related to operations:

The rest of this section elaborates on the tasks undertaken within each of these categories, and then remarks on the required staffing level and skills, and on the required management.

Category A: Science Software Continuing Development

The premise of the SDSS is that the data are collected uniformly across the sky, and the catalog entries are similarly uniform - in particular, we do not change the selection criteria for the core-science spectroscopic targets. However, this still leaves much room for continuing development - in fact it is inevitable that we will need to plan for a continuation of the development effort, but presumably at a decreasing level as time progresses.

Aside from necessary maintenance, continuing development includes: development that enhances the efficiency at which the survey is conducted; development that yields better or additional object parameters; and development of new pipelines for enhanced science goals of the survey (cf. Section 14.3.5.10).

Support of Developed Pipelines

This activity responds to problems uncovered during data processing that need to be addressed so that the pipelines can run more reliably. These may be actual bugs, but could also include small changes to run data of a slightly incorrect format or clarification of non-fatal warning messages. Proposed changes which can impact the consistency of the survey must of course be carefully considered.

Continuing Development of Core Pipelines

Additional pipeline development which does not impact the consistency of the survey can enhance the goals of the survey, e.g. measurement of quantities not used in target selection for the main samples; refined calibrations of photometry and astrometry; refined measurement of spectral features.

Some of the changes could be included in current production, with partial re-processing of data already completed. Other changes can be run from object catalogs and hence are easier to reprocess.

Develop Additional Software

We expect to develop software to create the merged pixel map, co-add data in the southern strip, or analyze special data sets such as the atlas images. One could imagine database development to accommodate new parameters, new data products, or additional cross-referencing of stored data. Continuing development of the science database could potentially make vast improvements in its impact. We may also wish to develop software other than what is necessary to carry out the core projects of the survey's Principles of Operation.

Support Software Infrastructure

We will require ongoing support of the software infrastructure, including cvs, upr, all of the products which combine to produce SHIVA/DERVISH, and cernlib. Support will include bug fixes, documentation updates, and help with difficult problems in the same spirit as we support the pipelines.

Support Science Toolkit

The science toolkit will allow scientists and operations staff to manipulate the data for quality assurance and science results purposes. As the survey progresses, there will be demands for routines that convert survey formats into formats that can be used in other packages, new features for better viewing/printing of data and images, etc. Presumably, scientists at each of the institutions will be writing their own routines that they will wish to share with the collaboration. These routines need to be collected and put it into a distributable form.

Support Databases

It will be possible at the start to operate the survey with an operational database that is not fully implemented, since it will be only a fraction of the size of the final database. It is imagined that as the survey progresses, the demands on the speed and ease of use of the database will increase. In addition, it is possible that the schema might evolve in response to small changes in the pipeline or the need for more efficient access to the data.

Support for the databases includes some database development to facilitate its use, and support of the static system.

Category B: Science Software Operations

Running the Pipelines

A pipeline reads input data, runs processing, and creates output data. The pipelines, discussed in detail in Chapter 14, carry out the following tasks:

-- Reduce the data from the Monitor Telescope and calculate photometric solutions;

-- Calculate the astrometric solutions for each frame;

-- Reduce the photometric imaging data, to yield object lists and measurements of a variety of parameters for each detected object;

-- Apply the calibrations determined from the astrometric and photometric solutions to the detected object;

-- Merge the object lists from the reductions of the images, pick objects for spectroscopic observations, and prepare spectroscopic plates;

-- Reduce the 2-D spectroscopic frames to create 1-D calibrated spectra, and use these to classify the objects and measure their redshifts.

For each pipeline, the following steps must be carried out: stage inputs, run the pipeline, check outputs, and run a script that stuffs the operational database. As more tools are developed these jobs can become more routine.

Staging the inputs involves collecting the observation log files (which could be directly from the mountain rather than from the data tapes), making a decision about where this processing is in the priorities (this could be in collaboration with survey strategy), making a processing plan, and generating the pipeline inputs from the operational database, the data tapes, from archives on the mountain, or from other sources (such as the FIRST survey or the serendipity working group). For tracking purposes, the operational database should be loaded at this time with the input files that were obtained from tape or directly from the mountain.

Running a pipeline should be an easy submission of a batch job. The jobs are periodically monitored to ensure that they are running correctly.

The output of each pipeline is checked to see that the instrumental signatures have been removed from the data, that the data have been corrected for atmospheric signatures, that the results are self-consistent where there is overlap, and that the results are consistent with pre-existing catalogs. The tests evolve with time as we discover which things are the most important to monitor.

Stuffing the operational database should be a routine operation. However, it is very important that only good data be stuffed, since it is very difficult to remove bad data once links have been made between the data in question and the rest of the database.

The final calibration pipelines recalibrate the data that have already been added to the operational database, in general using more information than was available when the photometric pipelines were run. For example, the Monitor Telescope may have observed more of the secondary patch stars than were necessary for the photometric pipeline to find, measure, and classify objects. Along with the final astrometric and photometric calibrations, this step also includes the process of merging the objects in a calibrated run with objects from other runs that have already been loaded into the database. Since this step creates links between a very large set of data and the rest of the database, it is important that quality checks have been done as well as possible before this.

The target selection pipeline chooses targets for all tiled objects on an object-by-object basis. The potential targets and those that were selected for tiling must be exported from the operational database to the science database in a timely manner so that the collaboration will have a chance to look for unacceptable results, and to find serendipitous targets, before the tiling process is run and the plates are physically drilled. Serendipitous targets are assigned priorities before they are entered into the operational database. Each Working Group will be responsible for ensuring that their respective targets are being chosen appropriately.

Tiling is done periodically, and not on a night-by-night basis. First, the tiling pipeline is run to assign tiled targets to plates. Then, another pipeline must be run which completes the fiber assignments for each tiled plate with the target-of- opportunity targets. The pipeline also assigns sky fibers, calibration stars, guide bundles, and light trap stars. Although the order of assignments is very complicated, the pipeline is deterministic and can be run like any other pipeline. The output of this step is a list of targets and holes that are associated with a set of plates. Plate design is done in a separate step, since it requires that expected observation time and temperature be input from Survey Strategy.

The spectroscopic pipelines reduce data from the spectroscopic cameras. The 2D spectra must be flatfielded, cosmic rays must be identified and dealt with, and the 1D spectra must be extracted. The spectra must be flux calibrated, and the red and blue halves of the spectrum put together. Then, the spectra must be classified and assigned a redshift. After the spectra have been loaded into the database, they must be linked to their corresponding photometric targets.

Visually inspecting even a small fraction of the spectra would require considerable resources - hence we will ensure that enough feedback comes from the automatic process to allow us to do quality control on the data collection and processing operations.

Quality Assurance

Data quality can be checked in several ways; we will learn which tests are the most sensitive as the survey progresses. The following gives examples of checks to be undertaken routinely:

Correlate our catalog (photometric and spectroscopic) in fields where object catalogs already exist.

Compare imaging data (magnitude and position) for the same objects measured twice in the overlap regions between strips and stripes.

A small number of objects will be spectroscopically measured in two overlapping plates. Compare the two spectra for flux and wavelength calibration.

Periodically compare a small portion of the catalog with imaging data. Print out a corrected image, and overlay the location and identification of objects measured in the pipelines, and location and resulting spectra of targets.

Calculate the reddening map from the calibrated catalogs. Cross-check different schemes for characterizing the reddening, and compare to external results.

The data-quality criteria will be defined clearly enough that the assessment of quality can be objective and therefore applied automatically. Nevertheless, we need to implement the capability of inspection at various points in the processing, and maintain the option of over-riding an automatic declaration of data acceptability. Regardless of how the details may be implemented, in the end, the imaging data must be declared to be acceptable or unacceptable for target selection, and this evaluation must be done quickly.

Survey Strategy

We review here a number of points of connection between the routine operations of the data production system and Survey Strategy.

Some operations, such as unplugging the cartridges at the end of a night, must be based on decisions made after only a short time has elapsed. In general, the notion is to relegate the short-term evaluation to the APO staff, and the longer-term evaluation to Fermilab operations. Thus, Survey Strategy operates at both sites, but with different tools that are tailored to the needed time scales.

The strategic planning for the next dark run can be formulated at Fermilab based on the assignments of data quality. For example, if the completion of some particular segment on the sky would enable tiling for a large contiguous region that could still be observed in the current season, then that region would be given high priority for scanning in the next month, and this prioritization is communicated as a Monthly Plan to the observers at APO. By the same token, a high priority would be given for data reduction of segments that complete large, contiguous, tilable regions. In order to determine the detailed prioritizations, a sophisticated planning tool is run at Fermilab that includes not only the constraint of which data have already been obtained, but also an ephemeris; a statistical model for weather patterns; important constraints imposed by the telescope tracking; and parameters relating to airmass, temperature, etc. (see Section 14.3.10). Because the evaluation of the quality of previously obtained data is dynamic, so too is the Monthly Plan.

One of the most important decisions is the appropriate time at which to undertake a new tiling solution. The efficiency of the tiling solution (number of plates required to cover a particular area) depends on the total area tiled and its geometrical attributes. Therefore, a new tiling solution will not be run every month, but the parallel operation of the spectroscopic survey does require some minimum frequency of generating new plates. In any case, when a tiling run is undertaken, in general it will be important to follow the process through to drilled plates delivered to the mountain within a minimum period of time.

The pipelined process includes generating plate drilling instructions from the lists of targets and holes associated with each plate. This activity connects with Survey Strategy because it is necessary to input the hour angle and temperature for which the plate will be drilled. Moreover, because of the necessity to queue the plate-drilling order, the plates must be prioritized and instructions sent to the plate drillers concerning which plates are needed urgently at the mountain.

Tracking the progress of the survey is not only essential for operations, but is needed for reporting to the collaboration. The information includes which areas of sky have been scanned; of those regions, which passed the quality criteria; which regions have been tiled; which plates have been drilled; which plates have been exposed; and finally which spectroscopic exposures have passed the quality criteria. Summary information that is broadcast to the collaboration periodically can be used by researchers for defining their analyses. Formal oversight of survey operations will also require such summary progress reports for purposes of evaluating the efficiency and time-to-completion.

Database Maintenance

Occasionally, we will stuff data into the database that we will later want to remove when it is found to be unacceptable. Routines will need to be written to remove records from the database which also remove the links to the rest of the data. This operation will remove the records, but not release the disk space. Therefore, it will occasionally be necessary to copy or compress the database. Procedures for these purposes must be developed, tested, and run during the course of the survey.

Other tasks related to database maintenance include:

Obtain and load databases of source positions and other attributes from external surveys.

-- Work with the science working groups to define and implement acceptance tests for generated catalogs. These tests may be developed "ad hoc" within the collaboration. Some tests should be run on a periodic basis to ensure the quality of the finished catalog.

-- Prepare final catalogs for transfer to the science database. Verify that these catalogs are at least internally consistent.

-- Periodically write tapes of the current science database for distribution to collaborating institutions.

-- Apply more recent calibrations to the science database as more is known about the astrometric and photometric systems on which the data are based.

-- Maintain raw data tapes and data stored on the CAPHSM robot.

-- Copy the corrected frames to tape for distribution to the JPG.

Category C: Data Processing Operations

The following tasks will be undertaken in the Feynman Computing Center (FCC) by staff members of the Fermilab Computing Division's support infrastructure.

Running Pipelines

Receive data tapes and check them against the expected contents. Report discrepancies to the SDSS production staff. Mount tapes and release jobs to begin running the Monitor Telescope, Astrometric, and Photometric pipelines.

Systems Maintenance

Schedule routine and preventive maintenance for the SDSS hardware in FCC; devise and then execute procedures for responding to problems (for example, coordinate service calls that are required for non-scheduled maintenance); oversee the proper installation of new hardware and peripherals; analyze procedures and configurations to enhance system reliability.

Maintain operating system and Fermilab software products.

Develop and install a system to backup critical files; run this system; verify periodically that the system works.

Database Support

Install the commercial database products required by the SDSS; maintain them at the proper support level.

Work with the SDSS collaboration to understand how the database should be configured, maintained, and used for optimal performance.

Determine what procedures are necessary to ensure data integrity in case of hardware or operator errors; implement the procedures and periodically verify that they work.

Management

Data Processing Operations requires careful attention to management because many tasks are undertaken by a large and diverse staff. (We estimate a total of 15 scientist FTEs and 4 computing professional FTEs are required. Ten of the scientist FTEs and all of the computing professional FTEs must be located at Fermilab.) Moreover, the data flow from Apache Point is sporadic and unpredictable, and when data do arrive, it will often be essential to run the pipelines through to plate design quickly. Finally, there will often be problems the nature of which will require intensive effort to identify, not to mention the time necessary to track down resources to fix the problem. The coordination of this effort requires time and attention dedicated specifically to the smooth running of the pipelines.

There are a number of connections between activities at Fermilab and elsewhere, and these too need to be managed: sending the prioritized Monthly Plan to APO; sending the current "data accepted "flags to APO (to enable the APO staff to run the on-the-mountain observing planning tool); sending drilling files to the plate driller; and interacting with scientists distributed across the collaboration.

The core management team includes two scientist positions called the Survey Scientist and the Data Processing Scientist, described below.

Survey Scientist

The Survey Scientist reports on quantifiable aspects of the progress of the survey. The Survey Scientist presides over the daily phone conferences with the APO Observers and transmits the observing priorities. He or she also runs weekly phone conferences that resolve matters arising in a wider context of SDSS operations (for example, operations at Apache Point or the science database).

The Survey Scientist has the following critical operational responsibilities:

1) declare data to be acceptable for target selection and/or for loading into a database;

2) decide when to undertake a tiling solution leading to a new plate fabrication order;

3) generate strategic plans and guidance for making new observations and other operations on all time scales longer than 1 night.

The Survey Scientist oversees all of the Quality Assurance efforts at various stages throughout the processing. This task includes that of receiving, evaluating, and acting on feed-back from collaboration scientists working in the Science Database.

Some aspects of the work leading up to these results can be delegated, and the Survey Scientist works cooperatively with the Data Processing Scientist to make these work assignments, but in the end the Survey Scientist is responsible for the results.

The Survey Scientist and the Data Processing Scientist will be able to fill each other's roles when redundancy is required.

Data Processing Scientist

The Data Processing Scientist is broadly responsible for the smooth and timely running of the cycle of bringing imaging data in, and producing object lists with accurate photometry and astrometry for target selection (the time-critical part), as well as tasks that are not necessarily time-critical but are logically related. The Data Processing Scientist assigns tasks to the members of the production team, and monitors the execution of the tasks. The Data Processing Scientist keeps track of who needs what training, and schedules training sessions accordingly. (It can be anticipated that for the duration of the survey, there will be a continuing need to train new people in the SDSS software systems and in the database. There are two kinds of customers: new members of the Fermilab production system team, and local experts at the participating institutions who support scientists trying to undertake analysis. The group at Fermilab provides a natural mechanism for training and re-training these people.)

The Data Processing Scientist conducts weekly meetings of the production team that review the current status of the survey from a data processing point of view and derive any action items arising.

The Data Processing Scientist is in effect the leader of a pool of scientists, some of whom are Fermilab employees, some of whom are visitors on extended appointments, and some of whom are not resident at Fermilab. The pool concept arises because for time-critical work, especially when there is a need for a quick-turnaround of a large volume of new data, it will be necessary to have redundancy in competency to do certain tasks to mitigate absences due to illness, vacation, weekends, etc. The dynamic matching of needs to available skills is the essential aspect of this job.

The current concept is that the Survey Scientist will be a revolving position (nominally one year), recruited from the ranks of the senior scientific staff at the participating institutions. This idea provides for substantive institutional participation in the operation of the survey. The Data Processing Scientist will be recruited from within the Experimental Astrophysics Group at Fermilab, and will be assigned for the duration of the survey to provide long-term stability.


References

Bartlett, J.F., Gunn, J.E., and Kent, S.M. 1995, "Photometric and Astrometric Imaging Camera Operations."

Fukugita, M., Ichikawa, T., Gunn, J.E., Doi, M., and Shimasaku, K. 1996, AJ 111, 1748.

Kron, R.G., Newburg, H., and Stoughton, C. 1996, "SDSS Data Reduction Operations."

McKay, T. 1995, "Monitor Telescope Operations Plan."

Petravick, D., Berman, E.F., Bracker, S., Kent, S.M., and Stoughton, C. 1992, "Requirements Analysis for Survey Operations."

Uomoto, A. 1995, "SDSS Spectroscopic Operations Plan."